27 research outputs found
Toward Improving the Evaluation of Visual Attention Models: a Crowdsourcing Approach
Human visual attention is a complex phenomenon. A computational modeling of
this phenomenon must take into account where people look in order to evaluate
which are the salient locations (spatial distribution of the fixations), when
they look in those locations to understand the temporal development of the
exploration (temporal order of the fixations), and how they move from one
location to another with respect to the dynamics of the scene and the mechanics
of the eyes (dynamics). State-of-the-art models focus on learning saliency maps
from human data, a process that only takes into account the spatial component
of the phenomenon and ignore its temporal and dynamical counterparts. In this
work we focus on the evaluation methodology of models of human visual
attention. We underline the limits of the current metrics for saliency
prediction and scanpath similarity, and we introduce a statistical measure for
the evaluation of the dynamics of the simulated eye movements. While deep
learning models achieve astonishing performance in saliency prediction, our
analysis shows their limitations in capturing the dynamics of the process. We
find that unsupervised gravitational models, despite of their simplicity,
outperform all competitors. Finally, exploiting a crowd-sourcing platform, we
present a study aimed at evaluating how strongly the scanpaths generated with
the unsupervised gravitational models appear plausible to naive and expert
human observers
A machine learning approach for detecting cognitive interference based on eye-tracking data
The Stroop test evaluates the ability to inhibit cognitive interference. This interference occurs when the processing of one stimulus characteristic affects the simultaneous processing of another attribute of the same stimulus. Eye movements are an indicator of the individual attention load required for inhibiting cognitive interference. We used an eye tracker to collect eye movements data from more than 60 subjects each performing four different but similar tasks (some with cognitive interference and some without). After the extraction of features related to fixations, saccades and gaze trajectory, we trained different Machine Learning models to recognize tasks performed in the different conditions (i.e., with interference, without interference). The models achieved good classification performances when distinguishing between similar tasks performed with or without cognitive interference. This suggests the presence of characterizing patterns common among subjects, which can be captured by machine learning algorithms despite the individual variability of visual behavior
Behind the Machine's Gaze: Biologically Constrained Neural Networks Exhibit Human-like Visual Attention
By and large, existing computational models of visual attention tacitly
assume perfect vision and full access to the stimulus and thereby deviate from
foveated biological vision. Moreover, modelling top-down attention is generally
reduced to the integration of semantic features without incorporating the
signal of a high-level visual tasks that have shown to partially guide human
attention. We propose the Neural Visual Attention (NeVA) algorithm to generate
visual scanpaths in a top-down manner. With our method, we explore the ability
of neural networks on which we impose the biological constraints of foveated
vision to generate human-like scanpaths. Thereby, the scanpaths are generated
to maximize the performance with respect to the underlying visual task (i.e.,
classification or reconstruction). Extensive experiments show that the proposed
method outperforms state-of-the-art unsupervised human attention models in
terms of similarity to human scanpaths. Additionally, the flexibility of the
framework allows to quantitatively investigate the role of different tasks in
the generated visual behaviours. Finally, we demonstrate the superiority of the
approach in a novel experiment that investigates the utility of scanpaths in
real-world applications, where imperfect viewing conditions are given
Gravitational Models Explain Shifts on Human Visual Attention
Visual attention refers to the human brain's ability to select relevant
sensory information for preferential processing, improving performance in
visual and cognitive tasks. It proceeds in two phases. One in which visual
feature maps are acquired and processed in parallel. Another where the
information from these maps is merged in order to select a single location to
be attended for further and more complex computations and reasoning. Its
computational description is challenging, especially if the temporal dynamics
of the process are taken into account. Numerous methods to estimate saliency
have been proposed in the last three decades. They achieve almost perfect
performance in estimating saliency at the pixel level, but the way they
generate shifts in visual attention fully depends on winner-take-all (WTA)
circuitry. WTA is implemented} by the biological hardware in order to select a
location with maximum saliency, towards which to direct overt attention. In
this paper we propose a gravitational model (GRAV) to describe the attentional
shifts. Every single feature acts as an attractor and {the shifts are the
result of the joint effects of the attractors. In the current framework, the
assumption of a single, centralized saliency map is no longer necessary, though
still plausible. Quantitative results on two large image datasets show that
this model predicts shifts more accurately than winner-take-all
Simulating Human Gaze with Neural Visual Attention
Existing models of human visual attention are generally unable to incorporate
direct task guidance and therefore cannot model an intent or goal when
exploring a scene. To integrate guidance of any downstream visual task into
attention modeling, we propose the Neural Visual Attention (NeVA) algorithm. To
this end, we impose to neural networks the biological constraint of foveated
vision and train an attention mechanism to generate visual explorations that
maximize the performance with respect to the downstream task. We observe that
biologically constrained neural networks generate human-like scanpaths without
being trained for this objective. Extensive experiments on three common
benchmark datasets show that our method outperforms state-of-the-art
unsupervised human attention models in generating human-like scanpaths